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The Kinetic Energy of Hydrocarbons as a Function of Electron Den¬ 
sity and Convolutional Neural Networks^ 

Kun Yao,*« and John Parkhill,^^ 


We demonstrate a convolutional neural network trained to reproduce the Kohn-Sham kinetic energy of hydrocarbons from elec¬ 
tron density. The output of the network is used as a non-local correction to the conventional local and semi-local kinetic function¬ 
als. We show that this approximation qualitatively reproduces Kohn-Sham potential energy surfaces when used with conventional 
exchange correlation functionals. Numerical noise inherited from the non-linearity of the neural network is identified as the major 
challenge for the model. Finally we examine the features in the density learned by the neural network to anticipate the prospects 
of generalizing these models. 


1 Introduction 


The ground state energy is determined by electron density 
n(r^. However the overwhelming majority of ’DFT’ ap¬ 
plications use the Kohn-Sham(KS) formalism which instead 
yields the energy as a functional of a non-interacting wave- 
functionEl. KS-DFT incurs a significant computational over¬ 
head in large systems because the minimal Kohn-Sham elec¬ 
tronic state uses at least one wavefunction for each electron. 
In orbital-free (OF) DFT only one function, the density n{r), 
is needed. Indeed, existing OF software packages are able to 
treat systems roughly an order-of-magnitude larger than effi¬ 
cient Kohn-Sham implementations on modest computer hard- 
ware@El. The total energy in OF-DFT can be written as. 


= T[n{r)]+E,,.el[n{r)]+ (1) 

^hartree [/j(r]) ~\~Eyin 

—nu 


Because inexpensive density functionals are known for the 
other components of the electronic energy, one must only pro¬ 
vide an accurate kinetic energy functional (T[n{r)]) to enjoy 
the computational advantages of OF- DFI^^. In this paper we 
follow a totally naive and empirical routeEl to approximations 
of T[n{r)], based on convolutional neural networks we call 
CNN-OpSHEl Our approximation has useful accuracy, and it 
is able to predict bonding and shell-structure. It is designed to 
be compatible with existing Kohn-Sham exchange-correlation 
functionals and algorithms used to evaluate them. The func- 
tional is non-local and does not invoke a pseudo-potentialtSESl. 
We also examine the features of the functional to infer features 
of the kinetic energy. 

Several groups have developed quantitative approximations 
to the Born-Oppenheimer potential energy surface (BO-PES) 
using Neural Networks and other machine learning tech- 
niquesCSEI]. !„ terms of theoretical detail our functional 
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lies between between Kohn-Sham DFT and these Machine- 
Learning approximations to BO-PES. CNN-OF has the advan¬ 
tage that it models a property of electron density rather than 
molecular geometry, and so it generalizes between molecules. 
It could also be used to predict density-dependent proper¬ 
ties and produce density embeddings for Kohn-Sham. OF- 
DFT is positioned to become an inexpensive approximation 
to Kohn-Sham theory, and is promising in multi-resolution 

schemesESEl 

Accelerating progress is being made towards accurate ki¬ 
netic energy functionals; a complete review is beyond the 
scope of this papei^SHSl. Most kinetic functionals de¬ 
scend from the local Thomas-Fermi(TF) or semi-local von- 
Weizscker(VW) functionals, which are exact for uniform elec¬ 
tron gas, and one orbital systems, respectively®. They can be 
written as. 


Ttf = 


J CTFn{r)^^^dr 




\Mr)f 

n{r) 


dr (2) 


where Ctf equals The modifications to these 

functionals can be roughly classified by their locality, the 
first group being semi-local approximations based on gradi¬ 
ent information. The accuracy of modern generalized gradi¬ 
ent approximation (GGA) kinetic functionals has remarkably 
reached ^ 1% for atoms®. However, existing GGAs have 
many qualitative failures® they do not predict shell structure 
of the density, and often catastrophically fail to predict even 
the strongest chemical bonds. Given the large magnitude of 
the kinetic energy relative to the XC energy, even this perfor¬ 
mance is impressive. 

The second class are non-local functionals of the density, 
and we can distinguish two important sub-types: two-point 
functionals based on a relation between density response and 
kinetic energy®® and empirical functi onals based on the 
kernel method (KM) of machine learning®®. When com¬ 
bined with pseudo-potentials, the non-local functionals use- 










fully predict bulk properties of metals and semiconductors®. 
Modifications such as angular momentum dependence fur¬ 
ther improve the accuracy of these functionals®. However 
to our knowledge there is limited evidence that the two- 
point functionals are practical for strongly inhomogeneous or¬ 
ganic or biological material, an d the y depend on a pseudo¬ 
potential to avoid core electrons®®. Another class of non¬ 
local functional recently appeared, empirical kinetic function¬ 
als based on the Kernel Method (KM) of Machine Learn- 
in^ini Ground-breaking studies of kernel method functionals 
have been restricted to 1-dimensional models of molecules, 
but have demonstrated several promising features, including 
bonding behavioi®. 

This work is related to the KM approach, but makes several 
different design choices: 

• Our functional takes the form of an enhancement func¬ 
tion, for a hybrid of the TF and VW func¬ 

tionals, and is locally integrated like an ordinary GGA 
xc-functional, although F is non-local. 

• We use Convolutional Neural Networks (CNN’s) rather 
than KM to learn the enhancement functional. 

• Like the KM functionals, but unlike the two-point func¬ 
tionals CNN-OF is evaluated in real-space, and no pseu¬ 
dopotential (PS) is required. 

• Our functional targets the positive semi-definite, non¬ 
interacting Kohn-Sham kinetic energy density T+E3. 

We explain the motivation and impact of each of these de¬ 
sign choices in the remainder of this paper. The fourth choice 
constitutes an approximation, also made by Kohn-Sham cal¬ 
culations, that the kinetic contribution to correlation is neg¬ 
ligible®. For our OF functional to be practically useful, it 
must be compatible with existing KS functionals, and so this 
approximation is a useful expedient that could be relaxed in 
fut ure w ork. Note that there is no unique kinetic energy den- 
sityEHSl modeling T+ has practical advantages discussed be¬ 
low. 

There are known conditions of the exact XC-functional that 
are not satisfied by the approximate XC functionals® in com¬ 
mon use today. Likewise there are known features of the ex¬ 
act kinetic functional that are no t sati sfied by this work® in¬ 
cluding density scaling relations®®, response relations and 
asymptotic limits®. We expect that enforcing these con¬ 
ditions will be key to the future development. Within our 
scheme it is a simple matter to enforce these physical con¬ 
straints with data that refiects the constraint, just as rotational 
invariance is enforced in ima^ classification by generating 
rotated versions of input data®. We defer a more complete 
investigation of these constraints to other work, and acknowl¬ 
edge from the outset that our functional is neither exact, nor 



Fig. 1 The orange dot is the quadrature point (F) at which the 
functional is being evaluated. The two lines which sample the 
normalized reduced gradient intersecting this point are input to the 
network which consists of three convolutional and two fully 
connected layers. The final layer outputs F(r'). 


unique defined, even based on its training set. This paper sim¬ 
ply establishes that convolutional neural networks are able to 
predict the Kohn-Sham kinetic energy in real molecules from 
the density. 

2 Functional Form 

Like most exchange correlation functionals the form of our 
non-local kinetic energy is based on a local kinetic energy 
density that is exact for a physical model system (VW or TF) 
multiplied by a unitless non-local enhancement functional F, 
defined as: 


Ftf(vw){r,n{r^)} = T:+{r) / (3) 

where n{r') is a sample of density near position r cen¬ 
tered there, T+(r) = ^ the positive-semidefinite 

Kohn-Sham kinetic energy density and '^tf{vw) is the local TF 
or VW kinetic energy density®. The total kinetic energy can 
be written as: 

T{n{r)) = j (4) 

In principle, the total kinetic energy should be the exact inter¬ 
acting kinetic energy of the whole molecule, but in practice 
the non-interacting kinetic energy is used in Kohn-Sham cal¬ 
culations. Our goal is to produce a kinetic functional which is 
compatible with existing KS functionals. We chose T+ since 
it is everywhere positive, which avoids numerical problems, 
and the ratios of Ttf and to are well-behaved functions. 
Ideally, Ft / should equal to 1 for any constant function input 
and /Vvt; also equal to 1 for any one-orbital system®. We have 













Fig. 2 Top: F prediction errors (unitless) of CNNs with different 
number of convolutional layers for valance region as a function of 
learning epochs. An epoch is the number of gradient steps taken for 
the whole training set. CNNs with more convolutional layers 
produce less training error. However, the test error is saturated with 
three layers. Bottom: Learning curves of CNNs with different input 
types. A CNN using as input has the less accuracy than a CNN 
using normalized 5 as input. 

examined several different choices of Ft / vs as described 
later on. 

2.1 Convolutional Neural Networks 

If the set of orbitals generating the density are known, calcu¬ 
lating the non-interacting kinetic energy is trivial. This obser¬ 
vation suggests that T[n{r)] can be thought of as recognizing 
orbitals leading to the density, and that the most robust avail¬ 
able statistical models for recognition are a logical choice for 
kinetic energy functionals. CNNs have emerged over the past 
few decades as the most powerful models for classification 
of image data. Previous ID machine-learning work has em¬ 
ployed a KM, kernel r idge regre ssion (KRR), to learn the ki¬ 
netic energy functionalS^^SEM] xhe main strength of the KM 


relative to a CNN is a straightforwards deterministic learn¬ 
ing process. The main drawback is difficulty scaling to large 
amounts of data with high dimensionalitylSI. Neural networks 
(NNs), and in particular convolutional neural networks, are 
known for their ability to digest high dimensional data and 
vast data sets. The universal approximation theoreml^ shows 
that neural networks are capable of approximating arbitrary 
functions on compact subspaces, a fiexibility gained for the 

price of non-linearitylSl. 

NNs are compositions of vector-valued functions separated 
into layers of neurons. Each layer linearly transforms a vector 
of input (y) with vectors learned parameters (weights w, and 
biases h). A non-linear activation function (/) is then applied 
to the result and yields the output value for the neuron, for 
example: 

y'm = f{b' +Y,y'F (5) 

n 

where is the value of the neuron m in layer /, / is the non¬ 
linear activation function, is the bias of layer /, is the 
weight of the connection. The activation function used in this 
work is the rectified linear unit (ReLU):/(x) = max(0,x). Our 
neural network consists of more than five layers which bear 
parameters (Fig. [^. The weights and biases of the network 
are Teamed’, by minimizing the prediction error of the net¬ 
work over a training set by gradient descent!^. 

Convolutional neural networks are a constrained form of 
NNs, inspired by the stmcture of the animal visual cortex^. 
They are appropriate for data like images (and electron den¬ 
sity) which have hierarchical local structure. They improve 
NNs by eliminating redundant parameters in the model. Each 
convolutional layer, as shown in Fig. 1, contains certain num¬ 
ber of ’filters’ which are local collections of neurons with 
fewer weights than inputs. These filters are convolved with 
the input data to produce output, for a filter size p x q the re¬ 
sult is: 

yL = fib' + E E y[~\a)(n+b)K~b^’') (6) 

a=0b=0 

where is the value of neuron in layer i at position (m,n), 
Wab is the weight matrix of the filter. Since the weight matri- 
ces are shared across area in convolutional layers, the 

number of parameters in a convolutional model is significantly 
reduced relative to a simple network. Each filter learns a sepa¬ 
rate sort of feature, and several filters are used in a layer which 
adds a new index of summation into Eq. Obtaining the net¬ 
work’s output involves a series of tensor contractions reminis¬ 
cent of the coupled-cluster equationsESI. Both models derive 
fiexibility by allowing non-linear dependence on parameter 
vectors. Convolutional layers are able to distill structur e and 
improve the robustness of the NNs for object recognition .^3^ 









































2.2 Choices of Input and Network Shape 

The whole density is an impractical amount of information 
to feed to F. Compact samples of n{r) are theoretically 
enoughl^, but functionals based on small samples must be nu¬ 
merically unstable. While designing our functional we looked 
at the shape of F for several small molecules, and also exper¬ 
imented with several sorts of input. Based on the structure of 
the exact F, we allow F to depend on two, one-dimensional 
lines of the density centered at r'. These lines are oriented to¬ 
wards the nearest nuclei to let the functional perceive nearby 
shell-structure. This choice of input is arbitrary, and should 
be refined in future work. The left panel of Fig. 2 shows the 
results of some experiments with different versions of the den¬ 
sity. The dimensionless gradient along the two lines we men¬ 
tioned above is the scheme used for our results, but we also 
experimented with the density, the square root of the density 
and other variations. The density itself does not display much 
shape and has a large dynamic range. The reduced gradient 
{s{r) = \Vn{r)\ /2kF{r)n{r)) is a better choice: is dimension¬ 
less and lies in a small range, which makes it suitable as CNN 
input. It clarifies shell structurel^Sl which makes it easier for 
the CNN to learn F. Indeed, One can see from Fig. 2 that us¬ 
ing s as input results in a large improvement over using 
as input. Each vector of 5' fed into the network is normalized 
using the following local response normalization functionl^ 

«(^) =- f"" , - (7) 

(1+0.01I5:^+5(^-')2)0.5 

where v is the position in the line. The normalized s as input 
further improves the numerical stability of the network, and 
focuses the network on learning spatial features. 

The performance of CNNs depends on their layer structure 
but the layer design must be found by a combination of in¬ 
tuition and trial and erroJ^. Small networks will not have 
enough complexity to learn their training data. Large net¬ 
works which are under-determined by their training data will 
eventually begin to learn the distribution of training data in¬ 
stead of the desired features, spoiling their generality. The 
right panel in Fig. shows the learning curves of three dif¬ 
ferent CNNs with different number of convolutional layers. 
Predictably, a CNN with more convolutional layers has less 
training error, however the test error shows the actual perfor¬ 
mance of the CNN and one can see the test error of the CNN 
with three convolutional layers and the CNN with four con¬ 
volutional layers is actually quite similar. Based on this test 
and the computational constraints of evaluating the network 
in our GPU-CNN code, we chose three convolutional layers 
as a production model. The performance of fully-connected 
neural network and KRR have also been tested and the result 
are shown in Table S-2. As one can see, CNN has smaller test 
error than both methods. The size of the CNN we settled on is 


summarized in Fig. 1 and Table S-1. 

In this paper, we focused on training kinetic energy 
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Fig. 3 Top left panel: Exact F^f calculated by Kohn-Sham method. 
Top right panel: F^f simulated by the trained neural network. 

Bottom panel: Error of the simulated ptf compared with the 
accurate Fjf. Neural networks simulate the exact Fjf accurately, and 
structureless noise is the dominant error. 

functional for alkanes. The functional is trained based 
on its equilibrium structure and ten randomized structures 
of seven molecules: butane, 2,5-dimethylhexane, 2-ethyl- 
3,3-dimethylpentane, octane, tetramethylbutane, 2,3,3,4- 
tetramethylpentane and 5-ethyl-2-methylheptane. 2 % of the 
standard xc-grid quadrature points of each structure were sam¬ 
pled as training samples, which are 3.7 Gigabytes of data. 
Each molecule is separated into three parts: a near carbon 
core region, valance region and tail region. Near carbon core 
region includes grids which lie less than 0.17 bohr from the 
nearest carbon nuclear and the learning target for grids in this 
region is Ftf. This region usually has less than 2 % of the 
total grid points but contributes more than 20 % of the total 
kinetic energy. Also considering the density in this region is 
chemically inert, it is treated separately . As mentioned above. 
Ftf grows non-linearly at long range whereas F^^^; nicely con¬ 
verges to 1, so /Vvt; was chosen as the learning target for grids 
in tail region which lie at least 2.3 bohr from the nearest nu¬ 
cleus. All the other grids consist of valance region, which is 
most sensitive to bonding and the learning target for grids in 
this region is Ftf. Each region was trained separately but with 
the same CNN architecture. To learn the functional we min¬ 
imize sum-squared F-prediction errors over training densities 
as a function of the network’s weights and biases. This error 










3 Results 



C-C Bond length (A) 



C-H Bond length (A) 

Fig. 4 Top panel: PES with kinetic energy calculated by 
Kohn-Sham, CNN and APBE kinetic energy functionals along the 
C-C bonding coordinate. Bottom panel: PES calculated with kinetic 
energy calculated by Kohn-Sham, CNN and APBE functionals along 
the C-H bonding coordinate. Our kinetic functional trained by CNN 
successfully find the local minimum with reasonable bond length 

is itself a non-linear function, which we minimize from a ran¬ 
dom guess using stochastic gradient descent^S with analytic 
gradients provided by backpropagationE^ZU, 

The training was performed on Nvidia Tesla K80 GPU to 
accommodate the memory demands of the density input. L2 
regularization was applied to prevent over fitting. Mini-batch 
stochastic gradient descent with momentum with batches of 
128 samples was used during training. With this CNN struc¬ 
ture and training scheme, the training wall time for each part 
are shown in Table S-3 and the total training wall time is 
29 hours. It is worth mentioning that under current learning 
scheme, our learning target is the kinetic energy enhancement 
factor instead of the kinetic energy, therefore the cancellation 
of the errors in the learning objective does not lead to the can¬ 
cellation of the errors in the kinetic energy. This is the reason 
that the noisy error is uniformly distributed over grid points. 
These results could perhaps be further improved by manipu¬ 
lating the learning target. 


Our intended applications for these functionals are force-field 
like approximations to the BO-PES, and an inexpensive 
method to solve for densities of large systems. We note that 
because the wall-time cost of evaluating F is constant regard¬ 
less of grid size, and because the number of quadrature grid 
points in a molecule is linear with system size, this scheme is 
trivially linear scaling and naively parallel. We demonstrate 
that the functional learns properties of the kinetic energy, 
rather than merely the total energy by showing it’s ability to 
predict F throughout different regions of a molecule. We then 
show that the functional is accurate enough to predict bonding 
semi-quantitatively, and produces smooth PESs despite its 
non-linearity by examining its accuracy on describing the 
bonds in ethane and predicting kinetic energy along a KS 
molecular dynamics (MD) trajectory of 2-methylpentane. 
We conclude by examining the features learned by the 
CNN. Test molecules do not occur in the training set used 
to optimize the CNN, differing in both their bonding and 
geometry. A locally interfaced hybrid of modified BAGEL 
codelSI, using the libXC libraryE3, and Cuda-Convnet!^ 
was used to produce these results. A conventional pruned 
Lebedev atom-centered gridl^ was used for integration of 
exchange-correlation energy and orbital-free kinetic energy in 
conjunction with Becke’s atomic weight schemel^l. The grid 
saturates the accuracy of most kinetic functionals to better 
than a microHartree. We note that in our hands saturates 
with grid more rapidly than typical XC functionals. The 
B3LYP exchange correlation functional^S and 6-3Ig* basis 
set was used in all results. Q-Chem was also used to produce 
some comparison resultsE3. Any quantity calculated with the 
CNN is obtained in single-precision arithmetic', the fact that 
T+ maintains sign is useful in this regard. 


3.1 Prediction of F 

The bonding curve test for ethane includes 9 images both for 
C-C bonding and C-H bonding, which contains 954,000 grid 
points in total. The MD trajectory of 2-methypentane includes 
17 steps which contains 3706,000 grid points in total. As men¬ 
tioned above, There are 533,760 training samples each con¬ 
sisting of 2000 inputs, and there are more than 800,000 param¬ 
eters in the CNN. The number of test samples that we predict 
outnumber the training samples by a factor of eight. After 
training a CNN to reproduce the non-local enhancement fac¬ 
tor, we wanted to establish whether there was any local trends 
in the accuracy of its prediction. To measure this we plot the F 
produced by our learned model alongside the ’accurate’ Kohn- 
Sham enhancement factor. The accurate Ft / and the one gen¬ 
erated by the trained CNN of ethane along the C-C-H plane is 
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Fig. 5 PES with kinetic energy calculated by Kohn-Sham, CNN and 
APBE kinetic energy functionals along 17 steps of a Kohn-Sham 
molecular dynamics trajectory of 2-methypentane. Eor better 
comparison, the APBE and the CNN curves are translated so that 
they start at the same point as the curve. The PES calculated with 
CNN kinetic energy simulates the KS qualitatively, the relative 
errors are similar to the difference between hybrid B3LYP and GGA 
PBE. 

shown in Figure]^ As one can see, the simulated /y/ is smooth 
and reproduces all the fine structure of the Kohn-Sham ptf 
surface, including the shell structure of carbon atom and the 
singularity of Ftf near carbon core. The root-mean-square de¬ 
viation of Ftf of ethane between accurate Ftf and simulated 
one at its equilibrium structure is 0.03 (unitless). Considering 
the range of Ftf, between 0 and 13, the stochastic nature of 
our minimization, and the single precision arithmetic used, the 
CNN’s Ftf is remarkably accurate. The shape of the error is 
relatively uniform noise distributed throughout the volume of 
the molecule. This noisy error is the greatest challenge facing 
CNN-OF, especially because noise near the core where almost 
all the density lies can render the functional chemically inac¬ 
curate. This noise is intimately related to the non-linearity of 
the CNN. It is known from image recognition NN predictions 
are inherently unstable to infinitesimal perturbations®. One 
can imagine several remedies: solving for an ensemble of net¬ 
works, training on adversarial examples, or denoising®. We 
will show in the following sections that although noise is the 
dominant problem, it does not preclude chemical applications 
of CNN-OF. 

3.2 Bonding and Potential Energy Surfaces 

Poor prediction of chemical bonds and bond energies has kept 
OF-DFT out of the mainstream of computational chemistry, 
although the errors in existing kinetic functionals are rela¬ 
tively small. Generally the magnitude of the kinetic energy 
decreases as atoms are drawn apart and the failure to bond is 
simply due large contribution of kinetic energy, and a small 


error in the slope. As figure]^ shows, the GGA-based APBE 
kinetic functional® (which is amongst the best kinetic func¬ 
tionals available in LibXC) fails to predict the C-H bond and 
C-C bond in ethane. The bonding curve generated with our 
trained kinetic energy functional successfully predicts local 
minimal both for C-C bond and C-H bond and the bonding 
curves are smooth especially in the vicinity of the minimum. 
Both the predicted C-C and C-H bond lengths lie within 50 
milli-Angstroms of the KS value. 

3.3 Accuracy along an MD trajectory 

A functional would be useless if it were only accurate in the 
vicinity of stationary geometries. To test generalization of our 
functional away from minimum, we examined a section of a 
high-temperature MD trajectory. Note, we have not imple¬ 
mented nuclear gradients of CNN-OF. This section uses a KS 
nuclear trajectory, and asks whether CNN-OF can produce a 
qualitatively correct surface for the KS geometries when sev¬ 
eral atoms are moving at once. Seventeen continuous steps 
of the molecular dynamics trajectory of 2-methypentane ob¬ 
tained at 1800 K are sampled. In our current implementa¬ 
tion this is still a demanding task, because the densities of 
each point which are 10s of gigabytes of data are stored and 
processed. The CNN kinetic functional captures the general 
shape of PES including the positions of maximum and sta¬ 
tionary points, although the curvature is imperfect. The error 
the CNN incurs is comparable to the error resulting from re¬ 
placing B3LYP by PBE. Based on this trial and the previous 
bonding curves, we believe that CNN’s are quite promising 
for the prediction of Kohn-Sham kinetic energies. 

3.4 What the Network Learns 

In order to gain some insights into the neural network we 
can examine the weights in the lowest convolutional layers. 
These weight vectors are ’features’ recognized in the density 
by the CNN. For example in a network trained to identify im¬ 
ages of people, the weights of low-lying convolutional layers 
look like patterns observed in images (edges and corrugation). 
The weight vectors of higher layers take on the shapes of in¬ 
creasingly complex objects in images (eyes, hands etc.). The 
smooth structure of the weight vectors is entirely due to the 
learning process because the network is initialized with ran¬ 
dom numbers. Features also tend to be nearly orthogonal, and 
by observing features we can diagnose over-parameterization, 
because excess filters do not lose their noisy character even 
when learning is complete. 

For this experiment we trained a network on the Madelung 
wavefunction, ^(r)^/^, for simplicity. The lowest feature lay¬ 
ers correspond directly to density of the molecule and The 
higher levels learn abstract representations of T+ features that 
are difficult to interpret. Looking at the weight matrices of 
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Fig. 6 Left panel: Madelung wavefunction (insert picture: molecular orbitals) along the sampled line of ethane (top: line points to carbon 
atom, bottom: line points to hydrogen atom). Middle panel: weights of the filters in first convolutional layer. Right panel: curves after the 
density is transformed by the filters. The transformed density near carbon atom shows obvious nodal structure, while the transformed 
hydrogen density does not. 


the lowest convolutional layer (Fig. [^, the non-locality of 
the kinetic functional is superficially obvious. Most features 
extend several angstroms away from the point at which the 
enhancement is being evaluated. We can also infer that the 
real-space size of the sample we feed the network (lOA) is ad¬ 
equate, since weights at the edges are near zero, and that the 
network has inferred some locality in T+’s dependence on the 
density. The non-locality of the weights corresponds to the im¬ 
provement of the two-point functionals over GGAs. We also 
see that we have basically saturated the number of features we 
can learn from our data. Small noisy oscillations persist in the 
weight vectors. 

Looking at the output of the lowest convolutional layer we 
can also see how the network is able to distinguish the shell 
structure of different atoms using its convolutional filters. We 
can see from figure that even though the density along the 
line points to hydrogen atom and carbon atom has similar sin¬ 
gle peak shape, they become much more distinguishable after 
the transformation of the first convolutional layer. The when 
shown the sample from carbon’s density the network produces 
outputs with many nodes, while the inputs pointing to hydro¬ 
gen’s density have none. Subsequent layers can easily detect 
these edges in their input to discriminate atoms. Ultimately 
to describe many materials, the network must learn the shell 
structures of several atoms, this basic classification is the first 
step in that process. 


4 Discussion and Conclusions 

We have shown that an empirically trained Convolutional 
Neural Network can usefully predict the Kohn-Sham kinetic 
energy of real hydrocarbons given their density. Our scheme 
can be practically integrated with existing functionals and 
codes without pseudo-potentials. The network is able to learn 
non-local structure and predict bond lengths despite being 
constructed in a naive way. We have shown (as predicted) that 
roughness brought about by the non-linearity of the network is 
the key challenge to further development, but that useful accu¬ 
racy is already possible. There are several venues to improve 
on the model, for example by enforcing physical constraints, 
improving the numerical precision, and data used to train the 
network. For the foreseeable future OF-DFT will, at-best, be 
an inexpensive approximation to Kohn-Sham, and so the com¬ 
putational cost of evaluating the functional must be considered 
alongside these improvements. 

We thank The University of Notre Dame’s College of Sci¬ 
ence and Department of Chemistry and Biochemistry for gen¬ 
erous start-up funding, and Nvidia corporation for a grant of 
processors used for the work. 
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